21 research outputs found

    A Maximum-Entropy Partial Parser for Unrestricted Text

    Full text link
    This paper describes a partial parser that assigns syntactic structures to sequences of part-of-speech tags. The program uses the maximum entropy parameter estimation method, which allows a flexible combination of different knowledge sources: the hierarchical structure, parts of speech and phrasal categories. In effect, the parser goes beyond simple bracketing and recognises even fairly complex structures. We give accuracy figures for different applications of the parser.Comment: 9 pages, LaTe

    Chunk Tagger - Statistical Recognition of Noun Phrases

    Full text link
    We describe a stochastic approach to partial parsing, i.e., the recognition of syntactic structures of limited depth. The technique utilises Markov Models, but goes beyond usual bracketing approaches, since it is capable of recognising not only the boundaries, but also the internal structure and syntactic category of simple as well as complex NP's, PP's, AP's and adverbials. We compare tagging accuracy for different applications and encoding schemes.Comment: 7 pages, LaTe

    Preference-Driven Bimachine Compilation : An Application to TTS Text Normalisation

    No full text
    This paper describes a grammar formalism and a deterministic parser developed for text normalisation in the rVoice1 text-to-speech (TTS) system. The rules are formulated using regular expressions and converted into a non-deterministic finite-state transducer (FST). At runtime, search is guided by parsing preferences which the user may associate with regular operators; the best solution is determined in a way similar to the directional evaluation of constraints in Optimality Theory. During compilation, the FST is converted into a bimachine, making deterministic parsing possible

    Incremental Construction of Minimal Sequential Transducers The Unsorted-Data Algorithm for Acyclic Sequential Transducers

    No full text
    This paper presents an efficient algorithm for the incremental construction of a minimal acyclic sequential transducer (ST) from a list of input and output strings. The algorithm generalizes a known method of constructing minimal finite-state automata (Daciuk, Mihov, Watson and Watson 2000). Unlike the algorithm published by Mihov and Maurel (2001), it does not require the input strings to be sorted in advance. The algorithm is illustrated by an application in a text-to-speech system.
    corecore